Introduction

Here we are analyzed about GSE114260 dataset, detailed information about this dataset is given below.

Platform title: Illumina HiSeq 2000 (Homo sapiens)

Submission data: Nov 02 2010

Last update data: Mar 27 2019

Organism: Homo sapiens

Number of GEO datasets that use this techology : 7897

Number of GEO samples that use this technology : 122103

Refer to corresponding paper, combined in file name means this file contains samples from both human and mice, since we only care about human, don’t need to use the two combined file. Therefore, the stranded_read_counts is the data file of interest.

To clarify, ER in this data set stands for estrogen receptor, not Endoplasmic reticulum.In this dataset, we have 2 groups with 3 replicates each, so 6 samples in total. C4-12ERaERE (ER lacking cell line stably transfected with ERaERE) relative to the MCF7 cells were used for RNAseq analysis. The treatment group undergo paroxetine and estrogen (E2) treatment, while in control group there is only paroxetine treatment.

Raw data has been normalized using trimmed mean approaches. The we performed threshold over-representation analysis, by g:Profiler, on the normalized dataset. The enrichment analysis result support conclusion in the original paper.Top gene returned is ESR1 and top term returned is ‘GO:0006614 SRP-dependent cotranslational protein targeting to membrane’. Refer to gene summary provided by HGNC, ESR1 encodes an estrogen receptor. In addition, there are paper shows that term ‘GO:0006614 SRP-dependent cotranslational protein targeting to membrane’ is related to breast cancer. Since the orignal paper conclude that response of a certain drug for breast cancer is dependent on ER, the conclusion is supported.

top 10 gProfiler ORA result
source term_name term_id adjusted_p_value
GO:BP SRP-dependent cotranslational protein targeting to membrane GO:0006614 0
GO:BP viral transcription GO:0019083 0
GO:BP cotranslational protein targeting to membrane GO:0006613 0
GO:BP viral gene expression GO:0019080 0
GO:BP viral process GO:0016032 0
GO:BP symbiotic process GO:0044403 0
GO:BP protein localization to endoplasmic reticulum GO:0070972 0
GO:BP nuclear-transcribed mRNA catabolic process, nonsense-mediated decay GO:0000184 0
GO:BP protein targeting to ER GO:0045047 0
GO:BP establishment of protein localization to endoplasmic reticulum GO:0072599 0

Non-threshold Gene Set enrichment analyisis

method

Choose GSEA(version 4.0.3) preranked analysis(Subramanian et al. 2005) since here we are using a ranked gene list. Database used is from bader lab, publised April 1st, 2020(Merico et al. 2010). Gene set permutation is the default and only choice for preranked analysis in GSEA. As for minimum gene set size, default GSEA setting ,which is 15, remain unchanged. However, maximum size of gene set is reduced to 200 in order to reduce runtime. Number of permutation is 1000.

Result

Summary

Upregulated correspond to upregualted in treated samples and downregulated correspond to downregulated in treated sample.

In upregulated category, top gene returned is SELENOCYSTEINE SYNTHESIS%REACTOME%R-HSA-2408557.2, pvalue = 0.000, ES = 0.85, NES = 3.01, FDR = 0.000.

In downregulated category, top gene returned is HALLMARK_ESTROGEN_RESPONSE_EARLY%MSIGDB_C2%HALLMARK_ESTROGEN_RESPONSE_EARLY, pvalue = 0.000, ES = -0.77, NES = -2.69, FDR = 0.000.

Top 10 upregulated from GSEA

NAME GS.br..follow.link.to.MSigDB SIZE ES NES NOM.p.val FDR.q.val FWER.p.val RANK.AT.MAX LEADING.EDGE
VIRAL MRNA TRANSLATION%REACTOME%R-HSA-192823.3 VIRAL MRNA TRANSLATION%REACTOME%R-HSA-192823.3 84 0.8643032 3.007869 0 0 0 1179 tags=74%, list=9%, signal=80%
PEPTIDE CHAIN ELONGATION%REACTOME%R-HSA-156902.2 PEPTIDE CHAIN ELONGATION%REACTOME%R-HSA-156902.2 84 0.8640424 2.994553 0 0 0 1179 tags=75%, list=9%, signal=82%
EUKARYOTIC TRANSLATION ELONGATION%REACTOME%R-HSA-156842.2 EUKARYOTIC TRANSLATION ELONGATION%REACTOME%R-HSA-156842.2 88 0.8542727 2.989950 0 0 0 1179 tags=72%, list=9%, signal=78%
SELENOCYSTEINE SYNTHESIS%REACTOME%R-HSA-2408557.2 SELENOCYSTEINE SYNTHESIS%REACTOME%R-HSA-2408557.2 87 0.8538864 2.988714 0 0 0 1179 tags=71%, list=9%, signal=78%
EUKARYOTIC TRANSLATION TERMINATION%REACTOME%R-HSA-72764.4 EUKARYOTIC TRANSLATION TERMINATION%REACTOME%R-HSA-72764.4 88 0.8575868 2.967159 0 0 0 1179 tags=70%, list=9%, signal=77%
CYTOPLASMIC RIBOSOMAL PROTEINS%WIKIPATHWAYS_20200310%WP477%HOMO SAPIENS CYTOPLASMIC RIBOSOMAL PROTEINS%WIKIPATHWAYS_20200310%WP477%HOMO SAPIENS 84 0.8384490 2.946797 0 0 0 1179 tags=70%, list=9%, signal=76%
FORMATION OF A POOL OF FREE 40S SUBUNITS%REACTOME DATABASE ID RELEASE 72%72689 FORMATION OF A POOL OF FREE 40S SUBUNITS%REACTOME DATABASE ID RELEASE 72%72689 95 0.8257862 2.941885 0 0 0 1203 tags=69%, list=9%, signal=76%
COTRANSLATIONAL PROTEIN TARGETING TO MEMBRANE%GOBP%GO:0006613 COTRANSLATIONAL PROTEIN TARGETING TO MEMBRANE%GOBP%GO:0006613 92 0.8197061 2.934378 0 0 0 1179 tags=66%, list=9%, signal=72%
SELENOAMINO ACID METABOLISM%REACTOME DATABASE ID RELEASE 72%2408522 SELENOAMINO ACID METABOLISM%REACTOME DATABASE ID RELEASE 72%2408522 105 0.8209368 2.928979 0 0 0 1179 tags=61%, list=9%, signal=66%
SRP-DEPENDENT COTRANSLATIONAL PROTEIN TARGETING TO MEMBRANE%REACTOME%R-HSA-1799339.2 SRP-DEPENDENT COTRANSLATIONAL PROTEIN TARGETING TO MEMBRANE%REACTOME%R-HSA-1799339.2 106 0.8149124 2.921791 0 0 0 1179 tags=63%, list=9%, signal=69%

Top 10 downregulated from GSEA

NAME GS.br..follow.link.to.MSigDB SIZE ES NES NOM.p.val FDR.q.val FWER.p.val RANK.AT.MAX LEADING.EDGE
HALLMARK_ESTROGEN_RESPONSE_EARLY%MSIGDB_C2%HALLMARK_ESTROGEN_RESPONSE_EARLY HALLMARK_ESTROGEN_RESPONSE_EARLY%MSIGDB_C2%HALLMARK_ESTROGEN_RESPONSE_EARLY 173 -0.7737817 -2.686789 0.0000000 0.0000000 0.000 1704 tags=51%, list=13%, signal=58%
HALLMARK_ESTROGEN_RESPONSE_LATE%MSIGDB_C2%HALLMARK_ESTROGEN_RESPONSE_LATE HALLMARK_ESTROGEN_RESPONSE_LATE%MSIGDB_C2%HALLMARK_ESTROGEN_RESPONSE_LATE 157 -0.6456295 -2.241304 0.0000000 0.0000000 0.000 1053 tags=31%, list=8%, signal=33%
PATHWAYS AFFECTED IN ADENOID CYSTIC CARCINOMA%WIKIPATHWAYS_20200310%WP3651%HOMO SAPIENS PATHWAYS AFFECTED IN ADENOID CYSTIC CARCINOMA%WIKIPATHWAYS_20200310%WP3651%HOMO SAPIENS 56 -0.7533783 -2.235558 0.0000000 0.0000000 0.000 1273 tags=50%, list=9%, signal=55%
EYE MORPHOGENESIS%GOBP%GO:0048592 EYE MORPHOGENESIS%GOBP%GO:0048592 56 -0.6926590 -2.060055 0.0000000 0.0038307 0.014 1269 tags=25%, list=9%, signal=27%
VISUAL SYSTEM DEVELOPMENT%GOBP%GO:0150063 VISUAL SYSTEM DEVELOPMENT%GOBP%GO:0150063 111 -0.6208371 -2.058356 0.0000000 0.0030645 0.014 2187 tags=32%, list=16%, signal=38%
CAMERA-TYPE EYE DEVELOPMENT%GOBP%GO:0043010 CAMERA-TYPE EYE DEVELOPMENT%GOBP%GO:0043010 81 -0.6463426 -2.053742 0.0000000 0.0025538 0.014 403 tags=16%, list=3%, signal=16%
HISTONE MODIFICATIONS%WIKIPATHWAYS_20200310%WP2369%HOMO SAPIENS HISTONE MODIFICATIONS%WIKIPATHWAYS_20200310%WP2369%HOMO SAPIENS 39 -0.7408906 -2.038358 0.0000000 0.0029178 0.019 1004 tags=36%, list=7%, signal=39%
EYE DEVELOPMENT%GOBP%GO:0001654 EYE DEVELOPMENT%GOBP%GO:0001654 110 -0.6154625 -2.036570 0.0000000 0.0029324 0.022 2187 tags=32%, list=16%, signal=38%
SENSORY ORGAN MORPHOGENESIS%GOBP%GO:0090596 SENSORY ORGAN MORPHOGENESIS%GOBP%GO:0090596 81 -0.6355954 -2.026007 0.0000000 0.0038485 0.032 1269 tags=22%, list=9%, signal=24%
SENSORY SYSTEM DEVELOPMENT%GOBP%GO:0048880 SENSORY SYSTEM DEVELOPMENT%GOBP%GO:0048880 115 -0.6104676 -2.022075 0.0000000 0.0036647 0.034 2187 tags=31%, list=16%, signal=37%
LXR-MEDIATED SIGNALING%REACTOME DATABASE ID RELEASE 72%9024446 LXR-MEDIATED SIGNALING%REACTOME DATABASE ID RELEASE 72%9024446 36 -0.7204237 -1.989345 0.0000000 0.0087138 0.089 824 tags=36%, list=6%, signal=38%
CAMERA-TYPE EYE MORPHOGENESIS%GOBP%GO:0048593 CAMERA-TYPE EYE MORPHOGENESIS%GOBP%GO:0048593 41 -0.7044234 -1.976283 0.0017182 0.0113972 0.128 403 tags=17%, list=3%, signal=18%
SENSORY ORGAN DEVELOPMENT%GOBP%GO:0007423 SENSORY ORGAN DEVELOPMENT%GOBP%GO:0007423 147 -0.5809889 -1.973475 0.0000000 0.0111504 0.136 2187 tags=29%, list=16%, signal=34%
HISTONE METHYLATION%GOBP%GO:0016571 HISTONE METHYLATION%GOBP%GO:0016571 56 -0.6673597 -1.964474 0.0000000 0.0124588 0.159 848 tags=29%, list=6%, signal=30%
PKMTS METHYLATE HISTONE LYSINES%REACTOME DATABASE ID RELEASE 72%3214841 PKMTS METHYLATE HISTONE LYSINES%REACTOME DATABASE ID RELEASE 72%3214841 42 -0.6924467 -1.962205 0.0000000 0.0118301 0.162 1004 tags=31%, list=7%, signal=33%
CELL-SUBSTRATE JUNCTION ASSEMBLY%GOBP%GO:0007044 CELL-SUBSTRATE JUNCTION ASSEMBLY%GOBP%GO:0007044 26 -0.7598497 -1.954675 0.0017271 0.0134554 0.191 1049 tags=38%, list=8%, signal=42%
REGULATION OF MECP2 EXPRESSION AND ACTIVITY%REACTOME%R-HSA-9022692.1 REGULATION OF MECP2 EXPRESSION AND ACTIVITY%REACTOME%R-HSA-9022692.1 28 -0.7356510 -1.953026 0.0000000 0.0133243 0.200 1185 tags=39%, list=9%, signal=43%
MUSCLE CELL DIFFERENTIATION%GOBP%GO:0042692 MUSCLE CELL DIFFERENTIATION%GOBP%GO:0042692 84 -0.6097430 -1.945164 0.0000000 0.0145124 0.228 1138 tags=25%, list=8%, signal=27%
CELLULAR GLUCOSE HOMEOSTASIS%GOBP%GO:0001678 CELLULAR GLUCOSE HOMEOSTASIS%GOBP%GO:0001678 37 -0.7022272 -1.938401 0.0000000 0.0157346 0.256 2153 tags=49%, list=16%, signal=58%
RETINA DEVELOPMENT IN CAMERA-TYPE EYE%GOBP%GO:0060041 RETINA DEVELOPMENT IN CAMERA-TYPE EYE%GOBP%GO:0060041 40 -0.6929891 -1.932622 0.0000000 0.0167860 0.282 403 tags=20%, list=3%, signal=21%

Comparision

Compare to result from thresholded over-representation analysis, there are several common terms.

Between GSEA upregulated result and ORA result:SRP-dependent cotranslational protein targeting to memebrane, cotranslational protein targeting to membrane, protein targeting to ER and establishment of protein localization to endoplasmic reticulum. Moreover, in ORA result, we have term ‘translational initiation’, while in upregualted GSEA result, we have ‘CAP-dependent translational initiation’ and ‘eukaryotic translation initiation’

No common terms in top 20 terms from downregulated GSEA and ORA result.

Also, we checked common gene related to both results. Number of common gene is given in venn diagram. Detailed list of common gene are listed.

This is a straight forward comparision.

Common Genes between GSEA result and ORA result
Upregulated Upregulated cont Upregulated cont Upregulated cont Upregulated cont Downregulated
RPS18 RPL30 RPL3 RPLP2 RPS7 SMARCA4
RPL27 RPL37A RPL13A RPL8 RPS3A EP300
RPS15A RPS21 SRP9 RPLP0 NUP37 CTBP1
RPS14 RPS11 RPL38 RPL13 NUP54 CARM1
RPS3 RPL35A RPL18 RPL28 PSMC3 CAV1
RPL24 RPL27A RPL7A RPL5 ARL6IP1 BCL2
RPS12 RPL10A RPL36 RPL32 UBB NOTCH1
RPS13 RPL12 RPS8 RPS15 PSMB1 CREBBP
RPS5 RPSA RPL35 RPL23A TCEB1 SRCAP
RPL26 RPS4X RPS25 RPL39 PSMA3 HIPK2
RPS6 RPS27A RPS9 RPS19 RBX1 SLC22A5
RPS20 RPS24 RPL31 RPL11 PSMB3 INHBB
RPS16 RPL9 RPL29 UBA52 EIF4A2 ANK2
RPS18 RPL30 RPL3 RPLP2 PSMA4 SMARCA4
RPL27 RPL37A RPL13A RPL8 PSMB4 EP300
RPS15A RPS21 SRP9 RPLP0 EIF3E CTBP1
RPS14 RPS11 RPL38 RPL13 CARM1
RPS3 RPL35A RPL18 RPL28 CAV1
RPL24 RPL27A RPL7A RPL5 BCL2
RPS12 RPL10A RPL36 RPL32 NOTCH1
RPS13 RPL12 RPS8 RPS15 CREBBP
RPS5 RPSA RPL35 RPL23A SRCAP
RPL26 RPS4X RPS25 RPL39 HIPK2
RPS6 RPS27A RPS9 RPS19 SLC22A5
RPS20 RPS24 RPL31 RPL11 INHBB
RPS16 RPL9 RPL29 UBA52 ANK2

Cytoscape

Though I can run GSEA successfully in docker container, for reasons I don’t know, when I tried to run docker by code in the docker container. It always returns

"Error in curl::curl_fetch_memory(url, handle = handle) :

Failed to connect to localhost port 1234: Connection refused"

So I decided to create the html notebook for cytoscape pipeline outside the container, then merge the resulstant image file directly into the final hrml report. The cytoscape pipeline file are also submitted, named EM pipeline.Rmd. Since the GSEA result created by running code above in Docker container are causing problem becasue of in-Docker file path, I ran the code chunk for GSEA outside of docker first and ran cytoscape use this GSEA result. I manually set the path to GSEA result.

Original EM before Manual Layout

This enrichment map is created with p-value cutoff = 0.005 and FDR q-value cutoff=0.005. This q-value is selected to reduce size of network

This enrichment map includes 224 nodes and 4610 edges. 24 of 356 nodes are isolated nodes.

Red node represent ,and blue nodes represents.

Annotate Network

Used autoannotation for this step. Default parameter were chosen. Detailed parameter information are listed below:

  • Cluster Source: clusterMaker2
  • ClusterMaker Algorithm: MCL Cluster
  • Edge Attribute: EnrichmentMap:: simlilarity_coefficient
  • Label Maker: WordCloud: Adjacent Words(default)
  • Max Words Per Label: 3
  • Word Adjancency Bonus: 8
  • Normalization Factor: 5
  • Attribute Names: [EnrichmentMap::GS_DESCR]
  • Display style: Clustered-Standard
  • Max Words per Cloud: 250
  • Cluster Cutoff: 1.0
  • Min Word Occurrence: 1

Additional Information about the EnrichmentMap:

Size of node correspond to size of geneset. Red nodes correspond to upregulated, blue nodes correspond to downregulated. Labels of node are geneset description. Thickness of the edge correspond to similarity coefficient. The more genes two nodes share in common, the thicker the edge.

Publication-ready figure

Theme Network

Interpretation

Similar to result from assignment 2, GSEA and enrichment map result support the original paper.

Recall the original paper(Petrossian et al. 2018), it says that the ER treatment is important for increased CDK4/6 response in HR+ breast cancer. Refer to our enrichment map, proteasome degradation is an important feature that related to upregulated enrichment result. Proteasome degradation leads to inhibited proteasome activity, which could increase tumour killing by decrease concenration of P-glyco-protein in membrane cells.(Orlowski and Dees 2002) In addition, mitocchondrial translational elongation is also upregulated term. Since mitochodrial dysfunction is associated with increased aggressiveness of breast cancer(Lunetti et al. 2019), upregulated translational elongation is supposed to be able to help breast cancer patient. Moreover, estrogen response, which is positively related to cancer cell growth(Petrossian et al. 2018), is downregulated.

Therefore, the conclusion of the original paper is supported by our analysis.

Specific pathway

Here I chose proteasome degradation. There are 2 reasons for chosing this pathway. First, this is a major pathway that clearly related to increased tumour killing. Second, size of its geneset is relatively large.

Here I use GeneMania, chose automatic network weighting and 0 max resultant gene, as default. Predicted link is the one weight the most in the resultant network, but I chose not to show this type of link since I’m more interested in confirmed relationships. In addition, I removed co-localization links.

legend

legend

I examined the gene node with highest rank, which are UBA7, IFNG, HLA-G, and PSMD5. UBA7 can perform as a marker for breast cancer patient since expression of UBA7 will be significantly reduced in breast cancer(Lin et al. 2020), however, here in our sample, expression of UBA7 is increased.As for IFNG, IFNG helps cancer therapy by facilitate tumor clearance and tumor escape (Ni and Lu 2018), therefore also has a positive effect in breast cancer therapies. HLA-G is rarely found in breast cancer tissues, but here it is highly upregulated(Palmisano et al. 2002), which indicate the experiment treatment is related to positive effect on breast cancer.At, last PSMD5 encodes essential 26s subunit of proteasome, since we observed a highly upregulated score for proteasome degradation, it is reasonable that there is more proteasome synthesised. So result of examination on proteasome degradation is consistent with what I found on enrichment map.

References

Lin, Meng, Yanqing Li, Shanshan Qin, Yan Jiao, and Fang Hua. 2020. “Ubiquitin-Like Modifier-Activating Enzyme 7 as a Marker for the Diagnosis and Prognosis of Breast Cancer.” Oncology Letters 19 (4). Spandidos Publications: 2773–84.

Lunetti, Paola, Mariangela Di Giacomo, Daniele Vergara, Stefania De Domenico, Michele Maffia, Vincenzo Zara, Loredana Capobianco, and Alessandra Ferramosca. 2019. “Metabolic Reprogramming in Breast Cancer Results in Distinct Mitochondrial Bioenergetics Between Luminal and Basal Subtypes.” The FEBS Journal 286 (4). Wiley Online Library: 688–709.

Merico, Daniele, Ruth Isserlin, Oliver Stueker, Andrew Emili, and Gary D Bader. 2010. “Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation.” PloS One 5 (11). Public Library of Science.

Ni, Ling, and Jian Lu. 2018. “Interferon Gamma in Cancer Immunotherapy.” Cancer Medicine 7 (9). Wiley Online Library: 4509–16.

Orlowski, Robert Z, and E Claire Dees. 2002. “The Role of the Ubiquitination-Proteasome Pathway in Breast Cancer: Applying Drugs That Affect the Ubiquitin-Proteasome Pathway to the Therapy of Breast Cancer.” Breast Cancer Research 5 (1). Springer: 1.

Palmisano, Giulio Lelio, Maria Pia Pistillo, Paolo Fardin, Paolo Capanni, Guido Nicolò, Sandra Salvi, Bruno Spina, Gennaro Pasciucco, and Giovanni Battista Ferrara. 2002. “Analysis of Hla-G Expression in Breast Cancer Tissues.” Human Immunology 63 (11). Elsevier: 969–76.

Petrossian, Karineh, Noriko Kanaya, Chiao Lo, Pei-Yin Hsu, Duc Nguyen, Lixin Yang, Lu Yang, et al. 2018. “ER\(\alpha\)-Mediated Cell Cycle Progression Is an Important Requisite for Cdk4/6 Inhibitor Response in Hr+ Breast Cancer.” Oncotarget 9 (45). Impact Journals, LLC: 27736.

Subramanian, Aravind, Pablo Tamayo, Vamsi K Mootha, Sayan Mukherjee, Benjamin L Ebert, Michael A Gillette, Amanda Paulovich, et al. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences 102 (43). National Acad Sciences: 15545–50.